Interdomain Dynamics via Paramagnetic NMR on the Highly Flexible Complex Calmodulin/Munc13-1

Paramagnetic NMR constraints are very useful to study protein interdomain motion, but their interpretation is not always straightforward. On the example of the particularly flexible complex Calmodulin/Munc13-1, we present a new approach to characterize this motion with pseudocontact shifts and residual dipolar couplings. Using molecular mechanics, we sampled the conformational space of the complex and used a genetic algorithm to find ensembles that are in agreement with the data. We used the Bayesian information criterion to determine the ideal ensemble size. This way, we were able to make an accurate, unambiguous, reproducible model of the interdomain motion of Calmodulin/Munc13-1 without prior knowledge about the domain orientation from crystallography.


Iterative Weighting
We devised an iterative approach to determine the relative scaling factor of RDCs and PCSs. First, we fitted RDCs and PCSs separately to the structural model, and estimated the associated error by the RMSD of experimental against back-calculated data. We then used these RMSDs as scaling factors to simultaneously fit PCSs and RDCs to the model. From this simultaneous fit, we again determined the RMSDs for PCSs and RDCs, and used them as scaling factors for a new simultaneous fit. This cycle was repeated until the RMSDs did not change any further, which was typically achieved after 3 to 5 iterations. The RMSDs used for scaling converged to 44 ppb for PCSs and 4.6 Hz for RDCs, yielding a relative scaling factor of 105 Hz ppm −1 . We did not distinguish between different types of PCSs/RDCs due to the lack of sufficient number of data in the N-terminal domain.
Due to paramagnetic broadening as well as structural noise it is conceivable that there is a dependence of the uncertainty in PCS on the distance to the paramagnetic center. In the case of structural noise small uncertainties in atom position lead to inaccuracies in the backcalculated PCSs, and the magnitude of these inaccuracies will scale with the inverse third power of the distance to the lanthanide. To check for such dependencies we have plotted the difference between experimental and back-calculated PCSs as a function of the lanthanide distance to the third inverse power (Fig. S1a). We have then binned these points in different ranges and calculated the RMS of the deviations as a measure of the PCS uncertainty in this region (Fig. S1b). Although there is a small trend of increasing uncertainty with Ln discernible, we judged it as a minor contribution and therefore chose not to include a distance-dependent weighting factor for the evaluation of PCSs. (b) Figure S1: (a) Absolute deviation of experimental and back-calculated PCSs (∆δ PCS ) against r −3 Ln . (b) RMS of ∆δ PCS in given ranges of r −3 Ln . The step size was changed from 0.1 nm 3 to 0.5 nm 3 above 0.5 nm 3 due to the scarcity of data. Uncertainties were calculated using bootstrapping. Note that the last bin contains only two points, making its value highly uncertain beyond what bootstrapping can capture.

Metal-Specific Tensors and Order Parameters
To compare the similarity of two susceptibility tensors, one can represent these tensors as the vector χ of their five independent components (χ = (∆χ ax ∆χ rh ∆χ xy ∆χ xz ∆χ yz ) T ).
Then one computes the normalized scalar products between two such vectors, which are shown in Table S1. To investigate how the different metal tensors scale differently under dynamic conditions, we generated 1 000 000 ensembles with three uniformly random orientations. We then averaged the different metal tensors using these three orientations and calculated the order parameter S as the scaling factor of the tensor eigenvalues. Then, we calculated pairwise ratios S 1 /S 2 for each of the different metals. The distribution of these ratios for all 1 000 000 random ensembles is shown in Fig. S2. As expected, the distribution is narrower for pairs with similar tensors (e.g., Er/Tm), but even for very different tensors it is rare to see order parameter ratios beyond 2 (e.g., 2 % for Dy/Tm). Additionally, we have constructed random ensembles with certain constraints and again looked at the ratio of metal-specific order parameters to see whether higher ratios could be obtained this way.

Conformational Search
The conformational search was executed as a Monte Carlo torsional sampling of certain bond torsion angles. We chose the two backbone dihedrals (ϕ, ψ) of the residues 76 to 81 in calmodulin's linker region to be sampled in the search, which corresponds to the region where Tjandra et al. have found increased mobility. 2 Sampling a larger stretch of the linker lead to distortions of the adjacent α-helices, so we limited the search to this relatively small range of residues. Each sampling step changed a random selection of up to six of the aforementioned torsion angles by up to 10°, followed by a minimization of the resulting structure.   These served as a basic pool of conformations in the further analysis.

Homogeneous Coordinates
The relative orientation of the two domains and the necessary transformations were implemented using homogeneous coordinates, which are a convenient way to simultaneously express translation and rotation. For regular three-dimensional coordinates translation is expressed as vector addition and rotation as matrix multiplication. Consider a point P with position vector p. Translation and rotations are computed as follows: Where q is a translation vector and R a rotation matrix (i.e., R T R = 1 and |R| = 1).
However, it is not obvious how to pack these two operations into one, and homogeneous coordinates provide a solution to this problem. They are generated by adding an additional coordinate w, and for any point (x, y, z) the tuple (xw, yw, zw, w) is a set of homogeneous coordinates of this point. Any choice of w refers to the same point, so it is typically set to 1.
In these coordinates, both translation and rotation can be expressed as matrix multiplication, as is shown below: With this it is possible to compute a matrix representing two or more concatenated translations and rotations by simply taking the matrix product of the individual transformation matrices. Using this formalism we expressed each conformation as a 4 × 4-matrix, which described the transformation of the C-terminal domain of some reference structure (in our case, 2BE6/B) to the desired location. The transformation matrix was found as the trans-formation that minimizes the distance RMSD between their backbone atoms (N, C α , C ′ ) of residues 84 to 145, so both structures were represented as the collection of these atom positions. First, both the reference and the conformation were translated such that the mean of all atom positions would lay in the origin, with the translation matrices T ref and T conf . From there it is a pure rotation R that transforms the reference into the conformation, which can be found using the Kabsch algorithm. 3 The total transformation matrix M is then found by concatenating these three operations: With these matrices and the knowledge about a single reference structure, any interdomain orientation could be reconstructed.
As an additional feature of this formalism, it is possible to construct translation-invariant vectors by setting the coordinate w to zero (i.e., (x, y, z, 0) T ). This is useful if one has already computed internuclear vectors for RDCs within the C-terminal domain, which are only affected by domain rotation, but not by domain translation.

Population Fitting: Constraints
The constraint that the sum of populations p i equals to one can be solved by adding an additional, synthetic data point with a very high value (in our case 10 6 ) that is the same for both the experimental as well as for all predicted data sets. The system of linear equations then has the following matrix form: x 11 x 12 · · · x 1nens x 21 x 22 · · · x 2nens . . . . . . . . . . . .
x exp,n data It is easy to see that the equation for the last row in Eq. (S4) is equivalent to the constraint of p i = 1, and due to the high value small deviations from this lead to a steep increase in the minimization criterion (i.e., RMSD).

9
Genetic Algorithm: Details The new generation was constructed from the old population as follows: the best 5 % are transferred to the next generation unaltered. This procedure, commonly referred to as elitism, ensures that the best fitness never worsens over multiple generations. Next, a fraction of 20 % of the old generation is selected at random as possible parents; however, the probability for selection p i parent depends on the individual's fitness f i : This ensures that fitter individuals have a higher chances of generating offspring, but it also allows certain poorer individuals to reproduce, which helps to prevent premature convergence on local minima. The parameter f 0 = 0.03 is a measure of how strongly the reproduction is dependent on fitness.
The newly generated individuals can be a mixture of two parents, with the crossover probability p co = 0.5, or descend from a single parent. In the former case they will inherit a subset of conformations from one parent and another subset from the other parent, where the relative sizes of these subsets (i.e., the crossover point) is uniformly random. In the case of a single parent all conformations will be inherited from this parent. For each new individual, the parent(s) are chosen randomly and uniformly from the set of all possible parents. In either case these offspring are subjected to mutation, where each conformation has a probability of p mut = 0.1 to be exchanged with an arbitrary conformation from the overall pool at random.
Most of the algorithm's parameters were chosen based on reasonable guesses and not strictly optimized. In trial runs we did not observe a strong dependence of the algorithm's performance on these parameters and therefore we decided not to conduct a more thorough optimization. The parameter f 0 was chosen based on what difference in f is significant (approximately 2σ, details see below). In preliminary runs we determined the optimal effective ensemble size n eff ens to be around 8 to 12, so we set the ensemble size n ens , which acts as an upper limit, to 20. The population size and the number of generations are limited by the computational power, and we set both to 1000, which yielded a runtime in the order of an hour. Finally, we ran 5994 independent repetitions of the genetic sampling (This seemingly arbitrary number is a consequence of a fixed computing time of three days).  Figure S5: HNCA{no H} pulse sequence. All annotated parameters were identical to the regular HNCA (Fig. S4). Figure S6: HNCO{no H} pulse sequence. All annotated parameters were identical to the regular HNCO (Fig. S4).

Data Collection
Most numerical and other digital data are too bulky to be printed, and are more conveniently provided as compressed archive of data files. Please to open the archive, or access it via the list of attachments of your pdf viewer.